Exponential Distribution (expon)#
The exponential distribution models waiting times between events when events happen at a constant average rate and the process has no memory.
It’s a cornerstone of applied probability because it links directly to the Poisson process, appears as a special case of the Gamma distribution, and is the continuous analogue of the geometric distribution.
What you’ll learn#
what
exponmodels and when it’s appropriatethe PDF/CDF/survival/hazard functions (with LaTeX)
key moments, MGF/characteristic function, and entropy
how parameters change the shape (rate vs scale)
core derivations: expectation, variance, likelihood + MLE
NumPy-only sampling via inverse transform + Monte Carlo validation
SciPy usage:
scipy.stats.expon(pdf,cdf,rvs,fit)hypothesis testing, Bayesian updates, and generative modeling patterns
Notebook roadmap#
Title & classification
Intuition & motivation
Formal definition (PDF/CDF)
Moments & properties
Parameter interpretation
Derivations ((\mathbb{E}[X]), (\mathrm{Var}(X)), likelihood)
Sampling & simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy integration (
scipy.stats.expon)Statistical use cases
Pitfalls
Summary
import math
import numpy as np
import scipy
from scipy import stats
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
SEED = 7
rng = np.random.default_rng(SEED)
np.set_printoptions(precision=4, suppress=True)
print("numpy ", np.__version__)
print("scipy ", scipy.__version__)
print("plotly", plotly.__version__)
numpy 1.26.2
scipy 1.15.0
plotly 6.5.2
Prerequisites & notation#
Prerequisites
comfort with basic calculus (integration by parts)
basic probability (PDF/CDF, expectation)
Notation
There are two common parameterizations:
Rate (\lambda > 0): average number of events per unit time.
Scale (\theta > 0): mean waiting time, with (\theta = 1/\lambda).
In this notebook, we primarily use the rate (\lambda). When we use SciPy, we map to its parameterization:
scipy.stats.expon(loc=0, scale=θ)where (\theta = 1/\lambda).
1) Title & classification#
Name:
expon(Exponential distribution)Type: continuous
Support: (x \in [0, \infty))
Parameter space:
rate form: (\lambda \in (0, \infty))
scale form: (\theta \in (0, \infty)), where (\theta = 1/\lambda)
A shifted form sometimes appears with a location parameter (\text{loc}): (x \ge \text{loc}). SciPy exposes this as loc.
2) Intuition & motivation#
What it models#
The exponential distribution is the canonical model for a waiting time until the next event when:
events arrive at a constant average rate, and
the process is memoryless.
Typical real-world use cases#
Queueing / arrivals: time between customer arrivals (Poisson process assumption)
Reliability: time-to-failure under a constant hazard rate
Survival analysis: baseline model before introducing covariates or non-constant hazards
Networking: simplified models of packet arrival gaps
Key relations to other distributions#
Poisson process: if events arrive as a Poisson process with rate (\lambda), inter-arrival times are i.i.d. (\mathrm{Exp}(\lambda))
Gamma: (\mathrm{Exp}(\lambda)) is (\mathrm{Gamma}(k=1,,\lambda)) (shape 1)
Geometric (discrete analogue): counts trials until first success; exponential is “continuous waiting time” analogue
Min of exponentials: if (X_i \sim \mathrm{Exp}(\lambda_i)) independent, then (\min_i X_i \sim \mathrm{Exp}(\sum_i \lambda_i))
Memorylessness (core intuition)#
For (X \sim \mathrm{Exp}(\lambda)), for (s,t \ge 0):
[ \mathbb{P}(X > s+t \mid X > s) = \mathbb{P}(X > t). ]
This is why the exponential distribution is often used as the “default” waiting-time model.
A one-line proof uses the survival function (S(x)=\mathbb{P}(X>x)=e^{-\lambda x}):
[ \mathbb{P}(X>s+t\mid X>s)=rac{S(s+t)}{S(s)}=rac{e^{-\lambda(s+t)}}{e^{-\lambda s}}=e^{-\lambda t}=\mathbb{P}(X>t). ]
3) Formal definition#
Let (X \sim \mathrm{Exp}(\lambda)) with (\lambda>0).
PDF#
[ f(x\mid\lambda) = \lambda e^{-\lambda x}, \quad x \ge 0. ]
(And (f(x\mid\lambda)=0) for (x<0).)
CDF#
[ F(x\mid\lambda) = \mathbb{P}(X \le x) = 1 - e^{-\lambda x}, \quad x \ge 0. ]
Survival function#
[ S(x\mid\lambda) = \mathbb{P}(X > x) = e^{-\lambda x}. ]
Hazard rate#
[ h(x) = \frac{f(x)}{S(x)} = \lambda. ]
The hazard is constant, which is a strong modeling assumption.
4) Moments & properties#
Moments#
For (X \sim \mathrm{Exp}(\lambda)):
Mean: (\mathbb{E}[X] = \tfrac{1}{\lambda})
Variance: (\mathrm{Var}(X) = \tfrac{1}{\lambda^2})
Skewness: (2)
(Excess) kurtosis: (6) (so kurtosis (= 9))
Median: (\tfrac{\ln 2}{\lambda})
MGF and characteristic function#
MGF (for (t < \lambda)): [ M_X(t) = \mathbb{E}[e^{tX}] = \frac{\lambda}{\lambda - t}. ]
Characteristic function: [ \varphi_X(t) = \mathbb{E}[e^{itX}] = \frac{\lambda}{\lambda - it}. ]
Entropy (differential, in nats)#
[ H(X) = 1 - \ln \lambda. ]
Other notable properties#
Mode: (0)
Memoryless: (\mathbb{P}(X>s+t \mid X>s)=\mathbb{P}(X>t))
Min stability: (\min_i X_i) is exponential with rate (\sum_i \lambda_i) (independent case)
Sums: if (X_1,\dots,X_n) i.i.d. (\mathrm{Exp}(\lambda)), then (\sum_i X_i \sim \mathrm{Gamma}(n,\lambda))
def exp_pdf(x: np.ndarray, rate: float) -> np.ndarray:
x = np.asarray(x)
return np.where(x >= 0, rate * np.exp(-rate * x), 0.0)
def exp_cdf(x: np.ndarray, rate: float) -> np.ndarray:
x = np.asarray(x)
return np.where(x >= 0, 1.0 - np.exp(-rate * x), 0.0)
def sample_expon_inverse(n: int, rate: float, rng: np.random.Generator) -> np.ndarray:
'''NumPy-only inverse-CDF sampling for Exp(rate).
If U ~ Uniform(0, 1), then X = -log(1-U)/rate ~ Exp(rate).
We use log1p for numerical stability when U is close to 0.
'''
if rate <= 0:
raise ValueError("rate must be > 0")
u = rng.random(n)
return -np.log1p(-u) / rate
def exp_loglik(rate: float, x: np.ndarray) -> float:
x = np.asarray(x)
if rate <= 0 or np.any(x < 0):
return -np.inf
return x.size * math.log(rate) - rate * float(x.sum())
# Quick Monte Carlo check of mean/variance
rate = 1.7
x_mc = sample_expon_inverse(200_000, rate=rate, rng=rng)
x_mc.mean(), x_mc.var(), (1 / rate), (1 / rate**2)
(0.5876066849609238,
0.34170872750465553,
0.5882352941176471,
0.34602076124567477)
5) Parameter interpretation#
Rate (\lambda)#
Larger (\lambda) (\Rightarrow) events happen “more frequently” (\Rightarrow) shorter typical waiting times.
(\mathbb{E}[X] = 1/\lambda), so (\lambda) is the inverse of the mean waiting time.
Scale (\theta)#
(\theta = 1/\lambda)
Larger (\theta) (\Rightarrow) waiting times are typically longer.
Shape changes#
All exponential PDFs have their mode at 0 and decay exponentially.
The decay rate is controlled entirely by (\lambda).
rates = [0.5, 1.0, 2.0]
x = np.linspace(0, 8, 400)
fig = go.Figure()
for r in rates:
fig.add_trace(
go.Scatter(
x=x,
y=exp_pdf(x, r),
mode="lines",
name=f"λ={r:g} (mean={1/r:g})",
)
)
fig.add_vline(x=1 / r, line_dash="dot", opacity=0.35)
fig.update_layout(
title="Exponential PDF for different rates",
xaxis_title="x",
yaxis_title="f(x)",
)
fig.show()
6) Derivations#
We derive (\mathbb{E}[X]), (\mathrm{Var}(X)), and the likelihood/MLE.
Expectation#
[ \mathbb{E}[X] = \int_0^\infty x, \lambda e^{-\lambda x},dx. ]
Integration by parts: let (u=x), (dv=\lambda e^{-\lambda x},dx). Then (du=dx) and (v=-e^{-\lambda x}).
[ \mathbb{E}[X] = \left[-x e^{-\lambda x}\right]_0^\infty + \int_0^\infty e^{-\lambda x},dx = 0 + \left[\frac{-1}{\lambda}e^{-\lambda x}\right]_0^\infty = \frac{1}{\lambda}. ]
Second moment and variance#
Compute (\mathbb{E}[X^2]): [ \mathbb{E}[X^2] = \int_0^\infty x^2, \lambda e^{-\lambda x},dx. ]
Integration by parts again: let (u=x^2), (dv=\lambda e^{-\lambda x},dx). Then (du=2x,dx) and (v=-e^{-\lambda x}).
[ \mathbb{E}[X^2] = \left[-x^2 e^{-\lambda x}\right]_0^\infty + \int_0^\infty 2x, e^{-\lambda x},dx. ]
Now compute (\int_0^\infty x e^{-\lambda x},dx) (integration by parts with (u=x), (dv=e^{-\lambda x}dx), (v=-\tfrac{1}{\lambda}e^{-\lambda x})):
[ \int_0^\infty x e^{-\lambda x},dx = \left[\frac{-x}{\lambda}e^{-\lambda x}\right]_0^\infty + \int_0^\infty \frac{1}{\lambda}e^{-\lambda x},dx = 0 + \frac{1}{\lambda^2}. ]
So: [ \mathbb{E}[X^2] = 2\cdot \frac{1}{\lambda^2} = \frac{2}{\lambda^2}. ]
Therefore: [ \mathrm{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2 = \frac{2}{\lambda^2} - \left(\frac{1}{\lambda}\right)^2 = \frac{1}{\lambda^2}. ]
Likelihood and MLE#
Given i.i.d. samples (x_1,\dots,x_n) with (x_i\ge 0):
[ L(\lambda) = \prod_{i=1}^n \lambda e^{-\lambda x_i} = \lambda^n \exp\left(-\lambda \sum_{i=1}^n x_i\right). ]
Log-likelihood: [ \ell(\lambda) = n\ln\lambda - \lambda \sum_{i=1}^n x_i. ]
Differentiate and set to zero: [ \ell’(\lambda) = \frac{n}{\lambda} - \sum_{i=1}^n x_i = 0 \quad\Rightarrow\quad \hat{\lambda}_{\text{MLE}} = \frac{n}{\sum_i x_i} = \frac{1}{\bar{x}}. ]
Equivalently, in scale form (\hat{\theta} = \bar{x}).
# MLE demo on simulated data
true_rate = 1.25
n = 400
x = sample_expon_inverse(n, rate=true_rate, rng=rng)
rate_mle = n / x.sum()
loglik_true = exp_loglik(true_rate, x)
loglik_mle = exp_loglik(rate_mle, x)
true_rate, rate_mle, loglik_true, loglik_mle
(1.25, 1.2681344724009203, -305.02253233559685, -304.9812395274336)
7) Sampling & simulation (NumPy-only)#
Inverse transform sampling#
If (U \sim \mathrm{Uniform}(0,1)), then using the CDF (F(x)=1-e^{-\lambda x}), we solve (U = F(X)):
[ U = 1 - e^{-\lambda X} \quad\Rightarrow\quad 1-U = e^{-\lambda X} \quad\Rightarrow\quad X = -\frac{\ln(1-U)}{\lambda}. ]
That yields a simple algorithm:
draw (U\in(0,1))
return (X = -\ln(1-U)/\lambda)
Numerical note: use log1p(-U) to avoid catastrophic cancellation when (U) is tiny.
# Sampling: compare histogram to the true PDF
rate = 1.7
n = 50_000
samples = sample_expon_inverse(n, rate=rate, rng=rng)
x_grid = np.linspace(0, np.quantile(samples, 0.995), 400)
fig = px.histogram(
samples,
nbins=60,
histnorm="probability density",
title=f"Monte Carlo samples vs PDF (n={n}, λ={rate:g})",
labels={"value": "x"},
)
fig.add_trace(
go.Scatter(x=x_grid, y=exp_pdf(x_grid, rate), mode="lines", name="true pdf")
)
fig.update_layout(yaxis_title="density")
fig.show()
8) Visualization (PDF, CDF, Monte Carlo)#
We’ll visualize:
the PDF for multiple rates
the CDF and an empirical CDF from Monte Carlo samples
# PDF and CDF side-by-side for multiple rates
rates = [0.5, 1.0, 2.0]
x = np.linspace(0, 8, 500)
fig_pdf = go.Figure()
fig_cdf = go.Figure()
for r in rates:
fig_pdf.add_trace(go.Scatter(x=x, y=exp_pdf(x, r), mode="lines", name=f"λ={r:g}"))
fig_cdf.add_trace(go.Scatter(x=x, y=exp_cdf(x, r), mode="lines", name=f"λ={r:g}"))
fig_pdf.update_layout(title="Exponential PDF", xaxis_title="x", yaxis_title="f(x)")
fig_cdf.update_layout(title="Exponential CDF", xaxis_title="x", yaxis_title="F(x)")
fig_pdf.show()
fig_cdf.show()
# Empirical CDF vs true CDF
rate = 1.7
n = 20_000
samples = sample_expon_inverse(n, rate=rate, rng=rng)
xs = np.sort(samples)
ys = np.arange(1, n + 1) / n
x_grid = np.linspace(0, np.quantile(xs, 0.995), 400)
fig = go.Figure()
fig.add_trace(go.Scatter(x=xs, y=ys, mode="lines", name="empirical CDF"))
fig.add_trace(go.Scatter(x=x_grid, y=exp_cdf(x_grid, rate), mode="lines", name="true CDF"))
fig.update_layout(
title=f"Empirical CDF vs true CDF (n={n}, λ={rate:g})",
xaxis_title="x",
yaxis_title="F(x)",
)
fig.show()
9) SciPy integration (scipy.stats.expon)#
SciPy uses a location-scale parameterization:
expon(loc=ℓ, scale=θ)has support (x\ge \ell)for the canonical exponential with rate (\lambda): set (\ell=0), (\theta=1/\lambda)
So if you think in rates, remember: [ \lambda = 1/\text{scale}. ]
from scipy.stats import expon
rate = 1.7
scale = 1 / rate
rv = expon(loc=0.0, scale=scale)
x_grid = np.linspace(0, 6, 400)
pdf_scipy = rv.pdf(x_grid)
cdf_scipy = rv.cdf(x_grid)
# SciPy sampling
samples_scipy = rv.rvs(size=10_000, random_state=rng)
# Fit: for an exponential with loc=0, the MLE for scale is the sample mean.
loc_hat, scale_hat = expon.fit(samples_scipy, floc=0.0)
rate_hat = 1 / scale_hat
(loc_hat, scale_hat, rate_hat)
(0.0, 0.593214983382439, 1.6857295045012564)
# Compare SciPy PDF to our PDF implementation
rate = 1.7
scale = 1 / rate
rv = expon(loc=0.0, scale=scale)
x_grid = np.linspace(0, 6, 400)
max_abs_diff = np.max(np.abs(rv.pdf(x_grid) - exp_pdf(x_grid, rate)))
max_abs_diff
1.1102230246251565e-16
10) Statistical use cases#
A) Hypothesis testing (rate parameter)#
If (X_1,\dots,X_n) are i.i.d. (\mathrm{Exp}(\lambda)), then the sum (S = \sum_i X_i) has a Gamma distribution:
[ S \sim \mathrm{Gamma}(\text{shape}=n,;\text{rate}=\lambda). ]
This gives an exact way to test hypotheses about (\lambda) using (S).
B) Bayesian modeling (Gamma prior on (\lambda))#
A Gamma prior is conjugate:
prior: (\lambda \sim \mathrm{Gamma}(\alpha_0,\beta_0)) (rate (\beta_0))
likelihood: i.i.d. exponential data
posterior: (\lambda \mid x \sim \mathrm{Gamma}(\alpha_0+n,\beta_0+\sum x_i))
C) Generative modeling (Poisson process)#
A Poisson process can be generated by sampling exponential inter-arrival times and cumulatively summing them.
# A) Exact test for H0: λ = λ0 using S = sum(X_i)
true_rate = 1.25
n = 50
x = sample_expon_inverse(n, rate=true_rate, rng=rng)
S = x.sum()
lambda0 = 1.0 # null hypothesis
# Under H0: S ~ Gamma(shape=n, scale=1/lambda0)
S_dist = stats.gamma(a=n, scale=1 / lambda0)
cdf = S_dist.cdf(S)
# Two-sided p-value by tail probability (simple symmetric-tail construction)
p_two_sided = 2 * min(cdf, 1 - cdf)
# Likelihood ratio test (asymptotic chi^2_1)
rate_mle = n / S
lrt = 2 * (exp_loglik(rate_mle, x) - exp_loglik(lambda0, x))
p_lrt = 1 - stats.chi2(df=1).cdf(lrt)
{
"S": S,
"rate_mle": rate_mle,
"p_two_sided_exact": p_two_sided,
"lrt_stat": lrt,
"p_lrt_asymptotic": p_lrt,
}
{'S': 43.80640950151249,
'rate_mle': 1.1413854860274195,
'p_two_sided_exact': 0.3857960784607864,
'lrt_stat': 0.8371053131042316,
'p_lrt_asymptotic': 0.3602259671779372}
# B) Bayesian update: Gamma prior on λ
# Prior: λ ~ Gamma(alpha0, beta0) in (shape, rate) form
alpha0 = 2.0
beta0 = 1.0
# Data
true_rate = 1.25
n = 40
x = sample_expon_inverse(n, rate=true_rate, rng=rng)
S = x.sum()
# Posterior: λ | x ~ Gamma(alpha0+n, beta0+S)
alpha_post = alpha0 + n
beta_post = beta0 + S
lam_grid = np.linspace(0, 4, 500)
prior_pdf = stats.gamma(a=alpha0, scale=1 / beta0).pdf(lam_grid)
post_pdf = stats.gamma(a=alpha_post, scale=1 / beta_post).pdf(lam_grid)
fig = go.Figure()
fig.add_trace(go.Scatter(x=lam_grid, y=prior_pdf, mode="lines", name="prior"))
fig.add_trace(go.Scatter(x=lam_grid, y=post_pdf, mode="lines", name="posterior"))
fig.add_vline(x=true_rate, line_dash="dot", opacity=0.5)
fig.update_layout(
title="Gamma prior → Gamma posterior for the exponential rate λ",
xaxis_title="λ",
yaxis_title="density",
)
fig.show()
post_mean = alpha_post / beta_post
post_mean
1.1823669274709439
# C) Poisson process simulation via exponential inter-arrival times
def simulate_poisson_process(rate: float, T: float, rng: np.random.Generator) -> np.ndarray:
if rate <= 0:
raise ValueError("rate must be > 0")
if T <= 0:
raise ValueError("T must be > 0")
t = 0.0
times: list[float] = []
while True:
t += sample_expon_inverse(1, rate=rate, rng=rng)[0]
if t > T:
break
times.append(t)
return np.asarray(times)
rate = 1.2
T = 10.0
arrival_times = simulate_poisson_process(rate=rate, T=T, rng=rng)
# Build a step plot for N(t)
t_grid = np.linspace(0, T, 400)
counts = np.searchsorted(arrival_times, t_grid, side="right")
fig = go.Figure()
fig.add_trace(go.Scatter(x=t_grid, y=counts, mode="lines", name="N(t)"))
fig.update_layout(
title=f"One Poisson process sample path (λ={rate:g}, T={T:g})",
xaxis_title="time t",
yaxis_title="count N(t)",
)
fig.show()
len(arrival_times), arrival_times[:5]
(15, array([0.0376, 0.8452, 1.5469, 2.4493, 3.0343]))
11) Pitfalls#
Rate vs scale confusion: many texts use rate (\lambda); SciPy’s
exponusesscale = 1/\lambda.Location shifts:
expon(loc=ℓ, ...)changes support to ([\ell,\infty)). If your model assumes support ([0,\infty)), fit withfloc=0.Nonnegative data: exponential likelihood is zero for negative observations. If you see negatives, your preprocessing/model is inconsistent.
Censoring/truncation: survival/reliability data is often right-censored; naive MLE on observed times can be biased.
Numerical issues:
use
log1p(-u)for inverse transform when (u) is close to 0for large (\lambda x),
exp(-λx)underflows; use log-PDF (scipy.stats.expon.logpdf) when needed
Goodness-of-fit tests with fitted parameters: a plain KS test after fitting parameters is not distribution-free (Lilliefors-type corrections apply).
12) Summary#
exponis a continuous distribution on ([0,\infty)) used for memoryless waiting times.PDF/CDF: (f(x)=\lambda e^{-\lambda x}), (F(x)=1-e^{-\lambda x}) for (x\ge 0).
Mean/variance: (\mathbb{E}[X]=1/\lambda), (\mathrm{Var}(X)=1/\lambda^2); skewness (=2), excess kurtosis (=6).
Inverse-CDF sampling is simple: (X=-\ln(1-U)/\lambda).
SciPy mapping:
expon(scale=1/λ)(and optionallyloc).Common use cases include hypothesis tests about rates, conjugate Bayesian updates (Gamma prior), and simulating Poisson processes.